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Abstract 

Background: The search for a habitable extrasolar planet has long interested scientists, but only recently 
have the tools become available to search for such planets. In the past decades, the number of known 
extrasolar planets has ballooned into the hundreds, and with it the expectation that the discovery of the 
first Earth-like extrasolar planet is not far off. 

Methodology/Principal Findings: Here we develop a novel metric of habitability for discovered plan- 
ets, and use this to arrive at a prediction for when the first habitable planet will be discovered. Using 
a bootstrap analysis of currently discovered exoplanets, we predict the discovery of the first Earth-like 
planet to be announced in the first half of 2011, with the likeliest date being early May 2011. 

Conclusions /Significance: Our predictions, using only the properties of previously discovered exo- 
planets, accord well with external estimates for the discovery of the first potentially habitable extrasolar 
planet, and highlights the the usefulness of predictive scientometric techniques to understand the pace of 
scientific discovery in many fields. 

Introduction 

The search for a habitable extrasolar planet has long interested scientists, but only recently have the 
observational tools become available to search for such planets [lj. Beginning in 1995 with the discovery 
of an extrasolar planet around Pegasi 51, a star much like our own [2j, the number of confirmed extrasolar 
planets has expanded into the hundreds. We now have a panoply of physical and orbital data on planets 
outside the solar system, and with it an increase in understanding of the formation of planetary systems. 
However, the holy grail of extrasolar planetary research - an Earth-like planet - has yet to be discovered. 
This search has more recently become even more intense with such large-scale surveys as NASA's Kepler 
mission 13]. 

While many, astronomers included, have speculated about when the first habitable planet might be 
discovered E], no quantitative scientometric analysis has been performed. In order to do this, we develop 
a metric of habitability for all discovered planets, and use this to arrive at a prediction for when the first 
habitable planet is expected to be discovered. 

Of course, predicting future scientific and technological progress is a slippery and difficult process. 
While there have been many such successes, such as Moore's Law [5J, history is littered with predictions 
that are far off the mark [6]. 

Here too there are many difficulties. Estimating the habitability of planets is itself a complicated 
process with many parameters |7j, but most research dwells on the combination of two properties of a 
planet: surface temperature necessary for liquid water, and planetary mass [8J. Using these guidelines, 
we constructed a simple habitability metric. Using the habitability time series of previously discovered 
exoplanets, we created a bootstrap method to predict when the first Earth-like planet would be discovered. 
We predict the announcement of its discovery with a high probability by mid-year 2011. 
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Materials and Methods 
Habitability Metric 

Using the calculated mass and temperature of a planet, we constructed a simple habitability metric, H kl 
for a given planet k, where is uninhabitable, and 1 is an Earth-like planet. H k is defined as the following: 

H k = H™H™H? (1) 
where H k is the product of three sub-measures, each themselves on a scale of to 1: 



H k w = habitability of surface temperature at a (2) 

-ff/f = habitability of surface temperature at b (3) 
H^ 1 = habitability of planetary mass (4) 

The formulas for the submeasures that makes up are below, where M© is one 

Earth mass, W T is the approximate width of half the acceptable temperature range (here we used 75 K), 
To is the midpoint of the range (To = 323 K), W M = 0.5 (for H^ x \ and x can either be a (semi-major 
axis) or b (semi- minor axis) of planet k): 



l l /. ,,n 2 
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(l + e wT ) (5) 



ttm = 1 1 (i+ P w M \ 2 <r\ 
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Each sub-measure is simply the product of two opposing modified logistic curves with some rescaling, 
which yield functions of the forms seen in Fig. 1. Note that while there is a positive probability assigned 
to negative masses, this is simply a byproduct of the symmetric nature of the function, and does not 
affect the calculations, as there are no such planets with negative mass. For H to be 1, all sub-measures 
must themselves be 1. Due to the "step-function- like" nature of these sub-measures, H is likely to be 
either very low near or very close to 1. Additional separate conditions (such as presence or absence 
of atmosphere) would make the model more restrictive and necessarily lower H, so our model is erring 
optimistically on habitability, is an upper bound on H . 

Note that while a nominal temperature, T = 400 K exceeds the boiling point of water, this value 
is representative only of the simple blackbody equilibrium temperature at the substellar point. Actual 
surface temperatures for potentially habitable planets will be controlled by a host of effects, some well- 
understood, some entirely speculative: The stellar flux is intercepted by an area ttR^i, but must warm 
a planet of surface area ^itR^. Our assigned limits on potential habitability represent correspond to 
a habitable zone whose outer radius is a factor of 2.6 times larger than its inner radius. Given the 
uncertainties on what constitutes the range of potentially habitable environments, this ratio is intended 
to be somewhat optimistic. Estimates by Mischna et al. (2000) [9], for example, advocate a ratio of 
2.1, whereas the influential study of Kasting et al. (1993) [lO] found a ratio of bounding distances equal 
to 1.76. The atmosphere may provide a significant greenhouse warming effect. The planet may have 
endogenous sources of energy, and cloud cover can provide a significant reflective albedo. For a recent 
detailed discussion of the geophysical and atmospheric factors relevant to potentially habitable planets, 
see, e.g. Lammer et al. (2010) [IT] . 

Calculations of planetary surface temperature involve assumptions of stars on the Main Sequence and 



black body radiation models, and were as follows 11 : 
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Figure 1. Habitability metric curves. A. Habitability metric for temperature H T ( X > as 
temperature varies. B. Habitability metric for H M as mass, in Earths, varies. 



Ta = \ (tf 5 (3-839 x 1Q*W) 1 ^ 
|.47rcr s6 [(jf) (6.955 x 10 8 m)] 2 J 

Using these metrics, Hf. was calculated from readily available data [12] for all 370 planets in the 
dataset. 

Discovery Date Prediction 

In order to create a robust estimate of the date of discovery of the first Earth-like planet, the following 
factors were considered: 

1. The extrasolar planets considered were detected by two methods: radial velocity (RV) and transit. 
The radial velocity method detects a planet using the Doppler effect to determine the motion of 
the planet's star, and the transit method detects a planet using changes in brightness of the star 
due to the planet's transit in front of it. While the transit method provides accurate estimates of 
the mass, M of a planet, the radial velocity method yields an estimate for Msin(z), where i is an 
unknown inclination for the planetary system, thereby only giving a lower bound for M. 
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2. Any estimate of the detection of the first Earth-like planet will necessarily be dependent on the 
vagaries of those planets which were previously discovered. If other stars were examined, a different 
set of of planets, and therefore H values, would have been found. 

A bootstrap analysis was conducted, which accounts for the M estimates for planets discovered by 
the radial velocity method, and provides a robust estimate of the date of discovery. Each realization 
consisted first of calculating M for each RV-detected planet in the dataset by drawing an inclination 
randomly chosen from the surface of the sphere. We then sampled 370 planets, each with its year of 
discovery, from the complete planetary data set with replacement, in order to create a bootstrapped time 
series of H values of exoplanet discovery. The date of discovery chosen for each extrasolar planet was the 
mid-point of the year of discovery (including for 2010), due to the lack of more precise data. 

To predict when the first habitable planet (H ps 1) will be discovered, we examined the upper envelope 
of each realization's H values by year (the points described by the highest habitability metric for each 
year). Since the upper and lower bound of H are known to be 1 and 0, respectively, fitting a logistic 
equation is appropriate, and is similar to many other discovery curves, such as the number of mammalian 



species or number of chemical elements 13 14 . A logistic best fit of of the upper envelope of H over 



time, H(t), where the parameters R and y were allowed to vary, is as follows: 

H ® = 1 + e -W ® 
The logistic curves of best-fit were performed using non-linear least squares fits on the sampled H 
values. Due to the step-like function of H(t), the best fit curves were extremely sensitive to initial 
conditions. While parameters close to the final fits were chosen for precision, variation in the results can 
be introduced by choosing different initial conditions. To determine the date as a fraction of the year, 
the value at which the logistic function first reached H(t) — 0.999 was calculated. To test the robustness 
of using this assumption, H(t) = 0.99 was used, which predicted early April 2011, and H{t) = 0.9999 
which predicted early June 2011. 

Other robustness checks were performed as well. For either a bounds of W T = 50K or W M — 0.5 for 
the habitability equation, a prediction of discovery in early May 2010 was found. It is likely then that 
the precision of the fit, along with the data available, are what drive the prediction. Assuming an error 
of a month in either direction is therefore reasonable. In order to convert this to days and months of a 
year, non-leap years were assumed. 



Results and Discussion 

We examined 370 exoplanets, all of which have well-characterized properties. Doing so, yields H = 
for the majority of exoplanets. Notably, Gliese 581 d, thought to be in the habitable zone, yields the 
highest value, with about H = 0.01, and is still quite low. While some authors (e.g. Wordsworth et al. 
2010 [15]) actually argue that Gliese 581 d is potentially habitable, we are of the opinion that its measured 
Msin(i) = 7.1M® leads to an expected mass close to 10 Earth masses, and a possibly water-dominated 
composition more akin to an ice giant planet such as Uranus or Neptune than to a terrestrial planet like 
the Earth. 

We conducted 10,000 realizations of the bootstrap method, where the data could successfully be fit 
to a curve, in order to arrive at a distribution of dates of discovery of the first habitable planet. This 
distribution is heavy-tailed, as seen in Fig. 2, with a median date of discovery of early May 2011 (2011.34). 
Additionally, detecting an Earth- like planet by the end of 2013 has a 2/3 probability, we reach 75% in 
2020, and don't achieve 95% likelihood until 2264. 

An example realization is shown in Fig. 3, where the best-fit logistic curve arrives at if = 1 (the 
upper border), in the first half of the year 2011. More precisely, it reaches H w 1 at about one-third 
(0.34) through the year, which is early May 2011. 
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Figure 2. Distribution of 10,000 realizations of bootstrap analysis by year. Inset shows the 
cumulative probability distribution of the year of discovery for 2011-2030. 
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Figure 3. A single realization of the habitability of extrasolar planets over time. H values 
for the extrasolar planets are plotted, with those of the upper envelope (maximum H for a given year of 
discovery) indicated in black. The black curve is the logistic best-fit curve of the upper envelope, using 
a nonlinear model, where R — 28.78 and y — 2011.10. The horizontal grey line indicates the maximum 
value of H = 1, the presence of an Earth-like habitable planet. 



Additionally, we conducted the same bootstrap analysis for subsets of the planetary dataset up to 
the end of the years 2001-2010 (prior to this, there is not enough data to yield robust estimates). This 
allows us to determine what the likeliest date of the discovery of an Earth-like planet would have been 
predicted to be, if the analysis were conducted throughout the previous decade. The median dates of 
discovery are shown in Fig. 4. 

The creation of a single metric of habitability, H , allows for quantitative prediction of when the first 
Earth-like planet is expected to be discovered - in this case, a date of early May 2011. Of course, this 
prediction of when the discovery of the first Earth-like planet will be announced has ignored technological 
advancement entirely, as well as many other factors. However, technological progress can often be well- 



described by a functional form independent of the processes underlying its advancement 16 . Similarly, 
it is likely that the multiple methods of extrasolar planet discovery (such as radial velocity and transit 
methods) combine to yield relatively smooth curves on the march towards further discovery. The testable 



() 




> 
o 
u 

CO 



o 

"Ed 
Q 

CO 

T3 




2018. 
2017. 
2016. 
2015. 
2014. 
2013. 
2012. 
2011. 
2010. 



T T T T 



2000 2002 2004 2006 2008 2010 
Year 



Figure 4. Median date of discovery using planetary data up to the end of a given year. 

Results are from a bootstrap analysis for the years 2001-2010. 



prediction given here is likely found to be accurate in the coming months, given the recent launch and 
ongoing results of many projects. 

A great deal of current interest is focused on NASA's ongoing Kepler mission [17] . The Kepler 
spacecraft employs the photometric transit method to detect planet candidates, and it is an open question 
as to whether this method can achieve the first detection of a planet with H « 1. While the initial 
results of Kepler were released on June 15, 2010, the Kepler team has delayed publication of 400 of the 
most promising extrasolar planetary candidates until February 2011. Within this large pool of withheld 
candidates, it is virtually certain that some have radii that are observationally indistinguishable from 
Earth's radius. It is likely, however, that because of the limited time base line of the mission to date, the 
Kepler planet candidates to published in February 2011 may be too hot to support significant values for 
H. 

In order to determine how useful Kepler will be in the search for a habitable planet, we reran our 
prediction analysis using only those planets discovered using the transit method. And it turns out that 
the method is unable to converge on a likely date of discovery, due to the paucity of the data (62 planets) 
and the low H values for these planets. No doubt Kepler will increase this number of planets, but this 
provides a counter-balance to the assumption that the Kepler team will discover the first habitable planet. 

It must be noted that by publicizing our prediction, there is a concern that it will become accurate, 
simply due to the well-studied Hawthorne Effect 18 1. However, due to the large number of observa- 
tions and long periods of time required to confirm an extrasolar planet discovery, it is unlikely that our 
prediction at this time will appreciably affect the announcement of the discovery of an Earth-like planet. 

Therefore, it is reasonable to use the habitability metric curve as a rough prediction for when the first 
potentially habitable planet will be discovered, in this case, as early as May 2011, and likely by the end 
of 2013. 
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